🚧Unsupervised BNNC

Neural Networks
Classification
Unsupervised Learning
Bayesian Neural Network Clustering for discovering hidden multivariate groupings.

General Principles

Building upon the Dirichlet Process Mixture Model (DPMM) and the Multiclass classification BNN, the Bayesian Neural Network Clustering (BNNC) model can learn to separate unlabelled datasets into K underlying mixture components using a deep gating network without observing any initial ground-truth classes.

Rather than sampling discrete cluster assignments via categorical weights directly as a standard GMM, BNNC routes data points through a neural layer to output continuous mixing probabilities \theta_i.

Considerations

Note
  • Marginalized Likelihood: The likelihood of any given observation is constructed by summing the probability density over all K potential multivariate Gaussian clusters recursively weighted by the network’s generated \theta_i mixing assignments (LogSumExp πŸ›ˆ trick).
  • Parsimony Prior: We enforce an explicit Dirichlet prior Dir(\alpha / K) over the gating network weights. Supplying a concentration parameter \alpha < 1 acts as a severe sparsity constraint, ensuring the network drives unused clusters towards probability 0, mirroring the stick-breaking properties of DPMMs automatically.
  • Consensus Clustering: To determine actual hard cluster sizes after MCMC sampling, we construct an average adjacency matrix measuring how often points are paired together, and run hierarchical clustering on the complement distance matrix.

Example

Below is an example code snippet demonstrating Unsupervised BNNC using the Bayesian Inference (BI) package to locate hidden clusters from a K=4 synthetic target.

Simulated Data

from BI import bi
import jax.numpy as jnp
import jax
import numpyro
import jax.scipy.stats as stats
from sklearn.datasets import make_blobs

# Setup device------------------------------------------------
m = bi(platform='cpu')

# Generate Synthetic Data ------------------------------------
# 4 distinct underlying true clusters
data, true_labels = make_blobs(
    n_samples=500, centers=4, cluster_std=0.8,
    center_box=(-10,10), random_state=101
)
m.data_on_model = dict(data=data)

# Define raw model ------------------------------------------------
def bnn_mixture_model(data, K=11, D_H1=10):
    # data shape: (N, D)
    N, D_X = data.shape
    
    # --- BNN Gating Network ---
    w1 = m.bnn.layer_linear(
        data, 
        dist=m.dist.normal(0, 1, name='w1_weight', shape=(D_X, D_H1)),
        activation='tanh'
    )
    
    w2 = m.bnn.layer_linear(
        w1,
        dist=m.dist.normal(0, 0.05, name='w2_weight', shape=(D_H1, K))
    )
    
    # Global parsimony prior: Dirichlet prior on global cluster weights
    # Alpha < 1 acts as a sparse prior to suppress inactive clusters
    alpha = 0.05
    pi = numpyro.sample('global_pi', numpyro.distributions.Dirichlet(jnp.ones(K) * (alpha / K)))
    
    logits = w2 + jnp.log(pi + 1e-10)
    log_p = jax.nn.log_softmax(logits, axis=-1)
    theta = jnp.exp(log_p)
    numpyro.deterministic('theta', theta)
    
    # --- Mixture Components ---
    mu = m.dist.normal(0, 5, name='mu', shape=(K, D_X))
    sigma = m.dist.exponential(1.0, name='sigma', shape=(K, D_X))
    
    # --- Marginalized Gaussian Mixture ---
    data_exp = jnp.expand_dims(data, axis=1) # (N, 1, D_X)
    mu_exp = jnp.expand_dims(mu, axis=0)     # (1, K, D_X)
    sigma_exp = jnp.expand_dims(sigma, axis=0) # (1, K, D_X)
    
    log_pdf_clusters = jnp.sum(stats.norm.logpdf(data_exp, loc=mu_exp, scale=sigma_exp), axis=-1) # (N, K)
    weighted_log_pdf = log_p + log_pdf_clusters # (N, K)
    
    # LogSumExp merges K clusters back together
    total_log_likelihood = jax.scipy.special.logsumexp(weighted_log_pdf, axis=-1)
    
    # Target Likelihood
    numpyro.factor('mixture_likelihood', jnp.sum(total_log_likelihood))

# Run MCMC
m.fit(bnn_mixture_model, num_chains=1)
Code
from BI import bi
from sklearn.datasets import make_blobs
m = bi()

# Generate Synthetic Data ------------------------------------
data, true_labels = make_blobs(
    n_samples=500, centers=4, cluster_std=0.8,
    center_box=(-10,10), random_state=101
)

# K is the maximum component limit, D_H1 is the hidden width
m.data_on_model = dict(data=data, K=11, D_H1=10, empirical_bayes=True) # Enables empirical priors over fixed standards

# Run MCMC
m.fit(m.models.bnnc, num_chains=1)

# Predict cluster targets
theta_samps, mu_samps, sigma_samps, labels = m.models.bnnc.predict(data, m.sampler)

# Plot Results
m.models.bnnc.plot(data, m.sampler)
/home/sosa/work/3.12venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning:

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
BI v 0.0.45 package loaded
jax.local_device_count 32
⚠️This function is still in development. Use it with caution.⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution.⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution.⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution. ⚠️
  0%|          | 0/2000 [00:00<?, ?it/s]
⚠️This function is still in development. Use it with caution.⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution.⚠️
⚠️This function is still in development. Use it with caution. ⚠️
⚠️This function is still in development. Use it with caution. ⚠️
warmup:   0%|          | 1/2000 [00:00<22:19,  1.49it/s, 1 steps of size 2.34e+00. acc. prob=0.00]warmup:   0%|          | 10/2000 [00:00<02:16, 14.57it/s, 1023 steps of size 1.96e-02. acc. prob=0.60]warmup:   1%|          | 14/2000 [00:01<01:59, 16.59it/s, 1023 steps of size 1.08e-02. acc. prob=0.64]warmup:   1%|          | 17/2000 [00:01<02:21, 14.01it/s, 1023 steps of size 2.06e-02. acc. prob=0.68]warmup:   1%|          | 19/2000 [00:01<02:55, 11.30it/s, 1023 steps of size 2.04e-02. acc. prob=0.69]warmup:   1%|          | 21/2000 [00:01<03:22,  9.79it/s, 1023 steps of size 5.30e-02. acc. prob=0.72]warmup:   1%|          | 23/2000 [00:02<03:47,  8.70it/s, 1023 steps of size 1.05e-02. acc. prob=0.70]warmup:   1%|▏         | 25/2000 [00:02<04:05,  8.05it/s, 1023 steps of size 2.36e-02. acc. prob=0.72]warmup:   1%|▏         | 26/2000 [00:02<04:13,  7.78it/s, 1023 steps of size 1.58e-02. acc. prob=0.71]warmup:   1%|▏         | 27/2000 [00:02<04:21,  7.54it/s, 1023 steps of size 2.62e-02. acc. prob=0.72]warmup:   1%|▏         | 28/2000 [00:03<04:28,  7.36it/s, 1023 steps of size 1.96e-02. acc. prob=0.72]warmup:   1%|▏         | 29/2000 [00:03<04:33,  7.20it/s, 1023 steps of size 2.21e-02. acc. prob=0.72]warmup:   2%|▏         | 30/2000 [00:03<04:39,  7.04it/s, 1023 steps of size 1.76e-02. acc. prob=0.72]warmup:   2%|▏         | 31/2000 [00:03<04:42,  6.96it/s, 1023 steps of size 2.57e-02. acc. prob=0.73]warmup:   2%|▏         | 32/2000 [00:03<04:45,  6.90it/s, 1023 steps of size 3.91e-02. acc. prob=0.74]warmup:   2%|▏         | 33/2000 [00:03<04:49,  6.79it/s, 1023 steps of size 1.93e-02. acc. prob=0.73]warmup:   2%|▏         | 34/2000 [00:03<04:51,  6.75it/s, 1023 steps of size 2.77e-02. acc. prob=0.73]warmup:   2%|▏         | 35/2000 [00:04<04:52,  6.71it/s, 1023 steps of size 3.86e-02. acc. prob=0.74]warmup:   2%|▏         | 36/2000 [00:04<04:52,  6.72it/s, 1023 steps of size 5.96e-02. acc. prob=0.75]warmup:   2%|▏         | 38/2000 [00:04<03:48,  8.58it/s, 1023 steps of size 1.21e-02. acc. prob=0.73]warmup:   2%|▏         | 39/2000 [00:04<04:06,  7.97it/s, 1023 steps of size 1.98e-02. acc. prob=0.74]warmup:   2%|▏         | 40/2000 [00:04<04:17,  7.60it/s, 1023 steps of size 2.63e-02. acc. prob=0.74]warmup:   2%|▏         | 41/2000 [00:04<04:28,  7.29it/s, 1023 steps of size 2.90e-02. acc. prob=0.74]warmup:   2%|▏         | 42/2000 [00:04<04:35,  7.11it/s, 1023 steps of size 2.30e-02. acc. prob=0.74]warmup:   2%|▏         | 43/2000 [00:05<04:41,  6.95it/s, 1023 steps of size 3.28e-02. acc. prob=0.75]warmup:   2%|▏         | 44/2000 [00:05<04:46,  6.84it/s, 1023 steps of size 4.21e-02. acc. prob=0.75]warmup:   2%|▏         | 45/2000 [00:05<04:50,  6.73it/s, 1023 steps of size 6.88e-03. acc. prob=0.73]warmup:   2%|▏         | 46/2000 [00:05<04:49,  6.75it/s, 1023 steps of size 1.18e-02. acc. prob=0.74]warmup:   2%|▏         | 47/2000 [00:05<04:49,  6.74it/s, 1023 steps of size 1.66e-02. acc. prob=0.74]warmup:   2%|▏         | 48/2000 [00:05<04:51,  6.70it/s, 1023 steps of size 2.69e-02. acc. prob=0.75]warmup:   2%|▏         | 49/2000 [00:06<04:51,  6.70it/s, 1023 steps of size 1.68e-02. acc. prob=0.75]warmup:   2%|β–Ž         | 50/2000 [00:06<04:51,  6.68it/s, 1023 steps of size 2.13e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 51/2000 [00:06<04:53,  6.65it/s, 1023 steps of size 1.58e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 52/2000 [00:06<04:54,  6.62it/s, 1023 steps of size 2.18e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 53/2000 [00:06<04:52,  6.65it/s, 1023 steps of size 3.13e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 54/2000 [00:06<04:52,  6.64it/s, 1023 steps of size 2.02e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 55/2000 [00:06<04:52,  6.64it/s, 1023 steps of size 2.87e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 56/2000 [00:07<04:53,  6.63it/s, 1023 steps of size 1.78e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 57/2000 [00:07<04:50,  6.68it/s, 1023 steps of size 2.72e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 58/2000 [00:07<04:49,  6.71it/s, 1023 steps of size 2.93e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 59/2000 [00:07<04:51,  6.66it/s, 1023 steps of size 7.91e-03. acc. prob=0.75]warmup:   3%|β–Ž         | 60/2000 [00:07<04:51,  6.65it/s, 1023 steps of size 1.25e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 61/2000 [00:07<04:52,  6.62it/s, 1023 steps of size 2.01e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 62/2000 [00:07<04:52,  6.63it/s, 1023 steps of size 3.02e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 63/2000 [00:08<04:55,  6.55it/s, 1023 steps of size 2.44e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 64/2000 [00:08<04:53,  6.60it/s, 1023 steps of size 2.61e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 65/2000 [00:08<04:52,  6.62it/s, 1023 steps of size 4.10e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 66/2000 [00:08<04:52,  6.61it/s, 1023 steps of size 1.21e-02. acc. prob=0.75]warmup:   3%|β–Ž         | 67/2000 [00:08<04:51,  6.64it/s, 1023 steps of size 1.71e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 68/2000 [00:08<04:51,  6.62it/s, 1023 steps of size 2.28e-02. acc. prob=0.76]warmup:   3%|β–Ž         | 69/2000 [00:09<04:52,  6.61it/s, 1023 steps of size 1.96e-02. acc. prob=0.76]warmup:   4%|β–Ž         | 70/2000 [00:09<04:53,  6.57it/s, 1023 steps of size 2.93e-02. acc. prob=0.76]warmup:   4%|β–Ž         | 71/2000 [00:09<04:51,  6.61it/s, 1023 steps of size 4.44e-02. acc. prob=0.76]warmup:   4%|β–Ž         | 72/2000 [00:09<04:51,  6.61it/s, 1023 steps of size 2.45e-02. acc. prob=0.76]warmup:   4%|β–Ž         | 73/2000 [00:09<04:50,  6.63it/s, 1023 steps of size 2.57e-02. acc. prob=0.76]warmup:   4%|▍         | 75/2000 [00:09<04:22,  7.32it/s, 1023 steps of size 2.62e-02. acc. prob=0.76]warmup:   4%|▍         | 76/2000 [00:10<04:29,  7.15it/s, 1023 steps of size 2.90e-02. acc. prob=0.76]warmup:   4%|▍         | 77/2000 [00:10<04:34,  6.99it/s, 1023 steps of size 3.04e-02. acc. prob=0.76]warmup:   4%|▍         | 78/2000 [00:10<04:38,  6.91it/s, 1023 steps of size 3.02e-02. acc. prob=0.76]warmup:   4%|▍         | 79/2000 [00:10<04:40,  6.85it/s, 1023 steps of size 3.46e-02. acc. prob=0.76]warmup:   4%|▍         | 80/2000 [00:10<04:42,  6.81it/s, 1023 steps of size 1.74e-02. acc. prob=0.76]warmup:   4%|▍         | 81/2000 [00:10<04:43,  6.77it/s, 1023 steps of size 1.66e-02. acc. prob=0.76]warmup:   4%|▍         | 82/2000 [00:10<04:45,  6.73it/s, 1023 steps of size 1.72e-02. acc. prob=0.76]warmup:   4%|▍         | 83/2000 [00:11<04:46,  6.70it/s, 1023 steps of size 2.05e-02. acc. prob=0.76]warmup:   4%|▍         | 84/2000 [00:11<04:45,  6.71it/s, 1023 steps of size 2.89e-02. acc. prob=0.76]warmup:   4%|▍         | 86/2000 [00:11<04:00,  7.95it/s, 975 steps of size 1.74e-02. acc. prob=0.76] warmup:   4%|▍         | 87/2000 [00:11<04:12,  7.57it/s, 1023 steps of size 1.64e-02. acc. prob=0.76]warmup:   4%|▍         | 88/2000 [00:11<04:22,  7.29it/s, 1023 steps of size 2.17e-02. acc. prob=0.76]warmup:   4%|▍         | 89/2000 [00:11<04:28,  7.12it/s, 1023 steps of size 1.88e-02. acc. prob=0.76]warmup:   4%|▍         | 90/2000 [00:12<04:34,  6.95it/s, 1023 steps of size 1.46e-02. acc. prob=0.76]warmup:   5%|▍         | 91/2000 [00:12<04:38,  6.85it/s, 1023 steps of size 2.12e-02. acc. prob=0.76]warmup:   5%|▍         | 92/2000 [00:12<04:41,  6.78it/s, 1023 steps of size 2.86e-02. acc. prob=0.77]warmup:   5%|▍         | 93/2000 [00:12<04:46,  6.67it/s, 1023 steps of size 2.90e-02. acc. prob=0.77]warmup:   5%|▍         | 94/2000 [00:12<04:45,  6.67it/s, 1023 steps of size 1.96e-02. acc. prob=0.76]warmup:   5%|▍         | 95/2000 [00:12<04:46,  6.65it/s, 1023 steps of size 2.52e-02. acc. prob=0.77]warmup:   5%|▍         | 96/2000 [00:12<04:47,  6.63it/s, 1023 steps of size 1.88e-02. acc. prob=0.76]warmup:   5%|▍         | 98/2000 [00:13<03:50,  8.24it/s, 1023 steps of size 3.00e-02. acc. prob=0.77]warmup:   5%|▍         | 99/2000 [00:13<04:06,  7.71it/s, 1023 steps of size 2.41e-02. acc. prob=0.77]warmup:   5%|β–Œ         | 100/2000 [00:13<04:17,  7.37it/s, 1023 steps of size 3.07e-02. acc. prob=0.77]warmup:   5%|β–Œ         | 108/2000 [00:13<01:25, 22.15it/s, 127 steps of size 8.31e-02. acc. prob=0.77] warmup:   6%|β–Œ         | 115/2000 [00:13<00:58, 32.46it/s, 255 steps of size 4.00e-02. acc. prob=0.77]warmup:   6%|β–Œ         | 119/2000 [00:13<01:01, 30.46it/s, 127 steps of size 6.57e-02. acc. prob=0.77]warmup:   6%|β–‹         | 126/2000 [00:13<00:48, 38.26it/s, 127 steps of size 8.58e-02. acc. prob=0.77]warmup:   7%|β–‹         | 133/2000 [00:14<00:41, 45.16it/s, 63 steps of size 2.79e-02. acc. prob=0.77] warmup:   7%|β–‹         | 139/2000 [00:14<00:38, 48.61it/s, 31 steps of size 7.12e-02. acc. prob=0.77]warmup:   7%|β–‹         | 148/2000 [00:14<00:31, 59.17it/s, 63 steps of size 4.40e-02. acc. prob=0.77]warmup:   8%|β–Š         | 158/2000 [00:14<00:26, 69.98it/s, 31 steps of size 1.21e-02. acc. prob=0.77]warmup:   8%|β–Š         | 166/2000 [00:14<00:32, 55.77it/s, 127 steps of size 5.64e-02. acc. prob=0.77]warmup:   9%|β–Š         | 173/2000 [00:14<00:31, 57.81it/s, 127 steps of size 4.58e-02. acc. prob=0.77]warmup:   9%|β–‰         | 180/2000 [00:14<00:33, 54.25it/s, 511 steps of size 2.18e-02. acc. prob=0.77]warmup:   9%|β–‰         | 187/2000 [00:14<00:33, 54.88it/s, 255 steps of size 3.96e-02. acc. prob=0.77]warmup:  10%|β–‰         | 196/2000 [00:15<00:28, 62.94it/s, 127 steps of size 7.25e-02. acc. prob=0.78]warmup:  10%|β–ˆ         | 207/2000 [00:15<00:24, 74.13it/s, 63 steps of size 7.01e-02. acc. prob=0.78] warmup:  11%|β–ˆ         | 218/2000 [00:15<00:23, 76.49it/s, 255 steps of size 3.02e-02. acc. prob=0.78]warmup:  11%|β–ˆβ–        | 228/2000 [00:15<00:21, 80.78it/s, 127 steps of size 5.32e-02. acc. prob=0.78]warmup:  12%|β–ˆβ–        | 238/2000 [00:15<00:20, 85.25it/s, 127 steps of size 6.32e-02. acc. prob=0.78]warmup:  12%|β–ˆβ–        | 247/2000 [00:15<00:20, 85.00it/s, 63 steps of size 1.04e-01. acc. prob=0.78] warmup:  13%|β–ˆβ–Ž        | 256/2000 [00:15<00:21, 82.97it/s, 127 steps of size 7.39e-02. acc. prob=0.78]warmup:  13%|β–ˆβ–Ž        | 265/2000 [00:15<00:21, 81.58it/s, 63 steps of size 9.97e-02. acc. prob=0.78] warmup:  14%|β–ˆβ–Ž        | 274/2000 [00:15<00:23, 74.70it/s, 31 steps of size 3.80e-02. acc. prob=0.78]warmup:  14%|β–ˆβ–        | 285/2000 [00:16<00:20, 83.13it/s, 63 steps of size 1.45e-01. acc. prob=0.78]warmup:  15%|β–ˆβ–        | 298/2000 [00:16<00:17, 94.88it/s, 63 steps of size 7.87e-02. acc. prob=0.78]warmup:  15%|β–ˆβ–Œ        | 308/2000 [00:16<00:19, 85.50it/s, 63 steps of size 9.73e-02. acc. prob=0.78]warmup:  16%|β–ˆβ–Œ        | 317/2000 [00:16<00:19, 86.62it/s, 63 steps of size 1.18e-01. acc. prob=0.78]warmup:  16%|β–ˆβ–‹        | 328/2000 [00:16<00:18, 89.40it/s, 127 steps of size 7.10e-02. acc. prob=0.78]warmup:  17%|β–ˆβ–‹        | 338/2000 [00:16<00:18, 88.36it/s, 63 steps of size 7.11e-02. acc. prob=0.78] warmup:  17%|β–ˆβ–‹        | 349/2000 [00:16<00:17, 92.13it/s, 63 steps of size 3.09e-02. acc. prob=0.78]warmup:  18%|β–ˆβ–Š        | 359/2000 [00:16<00:19, 85.99it/s, 63 steps of size 7.23e-02. acc. prob=0.78]warmup:  18%|β–ˆβ–Š        | 369/2000 [00:17<00:18, 85.84it/s, 127 steps of size 5.76e-02. acc. prob=0.78]warmup:  19%|β–ˆβ–‰        | 381/2000 [00:17<00:17, 92.84it/s, 63 steps of size 6.76e-02. acc. prob=0.78] warmup:  20%|β–ˆβ–‰        | 392/2000 [00:17<00:17, 93.16it/s, 127 steps of size 5.90e-02. acc. prob=0.78]warmup:  20%|β–ˆβ–ˆ        | 404/2000 [00:17<00:16, 95.68it/s, 127 steps of size 3.50e-02. acc. prob=0.78]warmup:  21%|β–ˆβ–ˆ        | 414/2000 [00:17<00:17, 90.79it/s, 63 steps of size 7.95e-02. acc. prob=0.78] warmup:  21%|β–ˆβ–ˆ        | 424/2000 [00:17<00:18, 85.04it/s, 63 steps of size 8.37e-02. acc. prob=0.78]warmup:  22%|β–ˆβ–ˆβ–       | 437/2000 [00:17<00:16, 92.57it/s, 127 steps of size 6.03e-02. acc. prob=0.78]warmup:  22%|β–ˆβ–ˆβ–       | 447/2000 [00:17<00:17, 87.82it/s, 63 steps of size 7.47e-02. acc. prob=0.79] warmup:  23%|β–ˆβ–ˆβ–Ž       | 456/2000 [00:18<00:20, 75.22it/s, 255 steps of size 2.21e-02. acc. prob=0.78]warmup:  23%|β–ˆβ–ˆβ–Ž       | 464/2000 [00:18<00:21, 72.49it/s, 63 steps of size 1.48e-01. acc. prob=0.78] warmup:  24%|β–ˆβ–ˆβ–       | 475/2000 [00:18<00:19, 80.10it/s, 127 steps of size 8.52e-02. acc. prob=0.78]warmup:  24%|β–ˆβ–ˆβ–       | 484/2000 [00:18<00:20, 75.07it/s, 63 steps of size 6.94e-02. acc. prob=0.78] warmup:  25%|β–ˆβ–ˆβ–       | 495/2000 [00:18<00:18, 82.00it/s, 127 steps of size 4.47e-02. acc. prob=0.78]warmup:  25%|β–ˆβ–ˆβ–Œ       | 505/2000 [00:18<00:17, 85.96it/s, 31 steps of size 3.37e-02. acc. prob=0.78] warmup:  26%|β–ˆβ–ˆβ–Œ       | 514/2000 [00:18<00:18, 79.47it/s, 127 steps of size 5.23e-02. acc. prob=0.78]warmup:  26%|β–ˆβ–ˆβ–Œ       | 524/2000 [00:18<00:17, 82.78it/s, 63 steps of size 1.19e-01. acc. prob=0.79] warmup:  27%|β–ˆβ–ˆβ–‹       | 535/2000 [00:18<00:16, 87.91it/s, 63 steps of size 1.91e-02. acc. prob=0.78]warmup:  27%|β–ˆβ–ˆβ–‹       | 544/2000 [00:19<00:18, 79.27it/s, 63 steps of size 1.03e-01. acc. prob=0.79]warmup:  28%|β–ˆβ–ˆβ–Š       | 555/2000 [00:19<00:16, 86.57it/s, 63 steps of size 1.15e-01. acc. prob=0.79]warmup:  28%|β–ˆβ–ˆβ–Š       | 564/2000 [00:19<00:16, 86.22it/s, 63 steps of size 8.60e-02. acc. prob=0.79]warmup:  29%|β–ˆβ–ˆβ–Š       | 574/2000 [00:19<00:16, 87.97it/s, 127 steps of size 6.80e-02. acc. prob=0.79]warmup:  29%|β–ˆβ–ˆβ–‰       | 583/2000 [00:19<00:17, 79.36it/s, 63 steps of size 1.04e-01. acc. prob=0.79] warmup:  30%|β–ˆβ–ˆβ–‰       | 593/2000 [00:19<00:17, 82.67it/s, 63 steps of size 6.08e-02. acc. prob=0.79]warmup:  30%|β–ˆβ–ˆβ–ˆ       | 602/2000 [00:19<00:16, 83.72it/s, 127 steps of size 5.74e-02. acc. prob=0.79]warmup:  31%|β–ˆβ–ˆβ–ˆ       | 614/2000 [00:19<00:15, 91.59it/s, 63 steps of size 5.48e-02. acc. prob=0.79] warmup:  31%|β–ˆβ–ˆβ–ˆ       | 624/2000 [00:19<00:15, 90.97it/s, 63 steps of size 7.91e-02. acc. prob=0.79]warmup:  32%|β–ˆβ–ˆβ–ˆβ–      | 634/2000 [00:20<00:15, 89.97it/s, 63 steps of size 4.73e-02. acc. prob=0.79]warmup:  32%|β–ˆβ–ˆβ–ˆβ–      | 644/2000 [00:20<00:14, 91.98it/s, 63 steps of size 1.19e-01. acc. prob=0.79]warmup:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 655/2000 [00:20<00:14, 96.06it/s, 63 steps of size 8.16e-02. acc. prob=0.79]warmup:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 665/2000 [00:20<00:14, 90.57it/s, 127 steps of size 4.40e-02. acc. prob=0.79]warmup:  34%|β–ˆβ–ˆβ–ˆβ–      | 675/2000 [00:20<00:14, 91.22it/s, 31 steps of size 8.34e-02. acc. prob=0.79] warmup:  34%|β–ˆβ–ˆβ–ˆβ–      | 685/2000 [00:20<00:14, 92.72it/s, 63 steps of size 6.44e-02. acc. prob=0.79]warmup:  35%|β–ˆβ–ˆβ–ˆβ–      | 696/2000 [00:20<00:13, 95.84it/s, 63 steps of size 8.59e-02. acc. prob=0.79]warmup:  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 708/2000 [00:20<00:13, 99.18it/s, 127 steps of size 4.36e-02. acc. prob=0.79]warmup:  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 718/2000 [00:20<00:13, 94.44it/s, 63 steps of size 9.24e-02. acc. prob=0.79] warmup:  36%|β–ˆβ–ˆβ–ˆβ–‹      | 728/2000 [00:21<00:13, 93.88it/s, 63 steps of size 7.37e-02. acc. prob=0.79]warmup:  37%|β–ˆβ–ˆβ–ˆβ–‹      | 738/2000 [00:21<00:14, 89.00it/s, 63 steps of size 1.09e-01. acc. prob=0.79]warmup:  37%|β–ˆβ–ˆβ–ˆβ–‹      | 748/2000 [00:21<00:13, 90.79it/s, 63 steps of size 8.87e-02. acc. prob=0.79]warmup:  38%|β–ˆβ–ˆβ–ˆβ–Š      | 761/2000 [00:21<00:12, 99.76it/s, 63 steps of size 7.90e-02. acc. prob=0.79]warmup:  39%|β–ˆβ–ˆβ–ˆβ–Š      | 772/2000 [00:21<00:12, 97.31it/s, 63 steps of size 8.74e-02. acc. prob=0.79]warmup:  39%|β–ˆβ–ˆβ–ˆβ–‰      | 782/2000 [00:21<00:12, 98.04it/s, 127 steps of size 4.95e-02. acc. prob=0.79]warmup:  40%|β–ˆβ–ˆβ–ˆβ–‰      | 792/2000 [00:21<00:12, 95.65it/s, 63 steps of size 7.66e-02. acc. prob=0.79] warmup:  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 805/2000 [00:21<00:11, 102.55it/s, 63 steps of size 6.78e-02. acc. prob=0.79]warmup:  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 816/2000 [00:21<00:11, 102.37it/s, 63 steps of size 9.89e-02. acc. prob=0.79]warmup:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 828/2000 [00:22<00:11, 106.03it/s, 31 steps of size 1.22e-01. acc. prob=0.79]warmup:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 840/2000 [00:22<00:10, 108.16it/s, 63 steps of size 8.73e-02. acc. prob=0.79]warmup:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 851/2000 [00:22<00:10, 108.17it/s, 63 steps of size 8.23e-02. acc. prob=0.79]warmup:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 862/2000 [00:22<00:10, 105.85it/s, 63 steps of size 8.61e-02. acc. prob=0.79]warmup:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 874/2000 [00:22<00:10, 108.26it/s, 63 steps of size 7.51e-02. acc. prob=0.79]warmup:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 886/2000 [00:22<00:10, 111.18it/s, 63 steps of size 8.34e-02. acc. prob=0.79]warmup:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 898/2000 [00:22<00:10, 105.53it/s, 63 steps of size 5.36e-02. acc. prob=0.79]warmup:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 909/2000 [00:22<00:10, 103.95it/s, 63 steps of size 6.75e-02. acc. prob=0.79]warmup:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 920/2000 [00:22<00:10, 104.94it/s, 63 steps of size 9.00e-02. acc. prob=0.79]warmup:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 931/2000 [00:23<00:10, 104.36it/s, 63 steps of size 7.71e-02. acc. prob=0.79]warmup:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 942/2000 [00:23<00:10, 104.54it/s, 63 steps of size 7.47e-02. acc. prob=0.79]warmup:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 955/2000 [00:23<00:09, 106.18it/s, 127 steps of size 3.78e-02. acc. prob=0.79]warmup:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 966/2000 [00:23<00:11, 87.94it/s, 255 steps of size 3.36e-02. acc. prob=0.79] warmup:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 976/2000 [00:23<00:12, 82.93it/s, 63 steps of size 8.25e-02. acc. prob=0.79] warmup:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 985/2000 [00:23<00:12, 84.56it/s, 63 steps of size 4.03e-02. acc. prob=0.79]warmup:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 994/2000 [00:23<00:12, 80.10it/s, 31 steps of size 2.41e-02. acc. prob=0.79]sample:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1003/2000 [00:23<00:12, 80.21it/s, 63 steps of size 5.54e-02. acc. prob=0.98]sample:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1014/2000 [00:24<00:11, 85.99it/s, 63 steps of size 5.54e-02. acc. prob=0.97]sample:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1025/2000 [00:24<00:10, 90.34it/s, 63 steps of size 5.54e-02. acc. prob=0.97]sample:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1036/2000 [00:24<00:10, 93.41it/s, 63 steps of size 5.54e-02. acc. prob=0.95]sample:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1046/2000 [00:24<00:10, 94.57it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 1057/2000 [00:24<00:09, 96.83it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 1068/2000 [00:24<00:09, 98.48it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1079/2000 [00:24<00:09, 99.55it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 1090/2000 [00:24<00:09, 99.95it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 1101/2000 [00:24<00:08, 99.92it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 1112/2000 [00:25<00:08, 100.31it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 1123/2000 [00:25<00:08, 101.08it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 1134/2000 [00:25<00:08, 100.69it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 1145/2000 [00:25<00:08, 100.72it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 1156/2000 [00:25<00:08, 100.88it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 1167/2000 [00:25<00:08, 101.03it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 1178/2000 [00:25<00:08, 101.00it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 1189/2000 [00:25<00:08, 101.15it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 1200/2000 [00:25<00:07, 101.29it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 1211/2000 [00:26<00:07, 100.39it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 1222/2000 [00:26<00:07, 100.41it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1233/2000 [00:26<00:07, 100.54it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1244/2000 [00:26<00:07, 101.04it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 1255/2000 [00:26<00:07, 100.89it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 1266/2000 [00:26<00:07, 100.61it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1277/2000 [00:26<00:07, 101.45it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1288/2000 [00:26<00:07, 101.47it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 1299/2000 [00:26<00:06, 101.68it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1310/2000 [00:26<00:06, 101.75it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1321/2000 [00:27<00:06, 101.75it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1332/2000 [00:27<00:06, 101.53it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 1343/2000 [00:27<00:06, 101.11it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1354/2000 [00:27<00:06, 101.01it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 1365/2000 [00:27<00:06, 101.18it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1376/2000 [00:27<00:06, 101.31it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1387/2000 [00:27<00:06, 101.81it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 1398/2000 [00:27<00:05, 102.05it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1409/2000 [00:27<00:05, 100.88it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 1420/2000 [00:28<00:05, 100.65it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1431/2000 [00:28<00:05, 100.78it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1442/2000 [00:28<00:05, 101.01it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1453/2000 [00:28<00:05, 100.83it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 1464/2000 [00:28<00:05, 100.31it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1475/2000 [00:28<00:05, 100.78it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1486/2000 [00:28<00:05, 100.19it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 1497/2000 [00:28<00:05, 100.29it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1508/2000 [00:28<00:04, 100.45it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 1519/2000 [00:29<00:04, 100.76it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1530/2000 [00:29<00:04, 100.68it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 1541/2000 [00:29<00:04, 100.78it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1552/2000 [00:29<00:04, 101.38it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1563/2000 [00:29<00:04, 100.89it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 1574/2000 [00:29<00:04, 101.19it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1585/2000 [00:29<00:04, 99.90it/s, 63 steps of size 5.54e-02. acc. prob=0.94] sample:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 1596/2000 [00:29<00:04, 100.01it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1607/2000 [00:29<00:03, 100.40it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 1618/2000 [00:30<00:03, 100.20it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1629/2000 [00:30<00:03, 99.99it/s, 63 steps of size 5.54e-02. acc. prob=0.94] sample:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1640/2000 [00:30<00:03, 100.28it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1651/2000 [00:30<00:03, 100.20it/s, 63 steps of size 5.54e-02. acc. prob=0.94]sample:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1662/2000 [00:30<00:03, 99.84it/s, 63 steps of size 5.54e-02. acc. prob=0.93] sample:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 1672/2000 [00:30<00:03, 99.55it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1682/2000 [00:30<00:03, 99.39it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 1692/2000 [00:30<00:03, 99.06it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1703/2000 [00:30<00:02, 99.53it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 1714/2000 [00:31<00:02, 99.80it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1725/2000 [00:31<00:02, 99.93it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1735/2000 [00:31<00:02, 99.64it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 1746/2000 [00:31<00:02, 100.18it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1757/2000 [00:31<00:02, 99.48it/s, 63 steps of size 5.54e-02. acc. prob=0.93] sample:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 1767/2000 [00:31<00:02, 99.56it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1777/2000 [00:31<00:02, 99.25it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1788/2000 [00:31<00:02, 99.67it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 1799/2000 [00:31<00:02, 100.17it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1810/2000 [00:31<00:01, 100.09it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1821/2000 [00:32<00:01, 100.16it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1832/2000 [00:32<00:01, 99.09it/s, 63 steps of size 5.54e-02. acc. prob=0.93] sample:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1842/2000 [00:32<00:01, 99.22it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1853/2000 [00:32<00:01, 99.60it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1863/2000 [00:32<00:01, 99.18it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 1873/2000 [00:32<00:01, 99.33it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1884/2000 [00:32<00:01, 99.63it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 1895/2000 [00:32<00:01, 100.38it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1906/2000 [00:32<00:00, 100.86it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 1917/2000 [00:33<00:00, 100.21it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1928/2000 [00:33<00:00, 99.37it/s, 63 steps of size 5.54e-02. acc. prob=0.93] sample:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1938/2000 [00:33<00:00, 99.23it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 1948/2000 [00:33<00:00, 97.45it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1958/2000 [00:33<00:00, 96.52it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1968/2000 [00:33<00:00, 96.95it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1979/2000 [00:33<00:00, 98.03it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1990/2000 [00:33<00:00, 99.31it/s, 63 steps of size 5.54e-02. acc. prob=0.93]sample: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2000/2000 [00:33<00:00, 59.03it/s, 63 steps of size 5.54e-02. acc. prob=0.93]
⚠️This function is still in development. Use it with caution.⚠️
Model found 4 clusters.
⚠️This function is still in development. Use it with caution.⚠️
⚠️This function is still in development. Use it with caution.⚠️
Model found 4 clusters.

Mathematical Details

In the Bayesian formulation, we place priors πŸ›ˆ on all weights and biases and define a likelihood for the output. For a BNNC model with a K-hidden-layer gating network and a D_X-vector of predictors we can run the model as below. For the code example, we consider a single hidden layer gating network with a hyperbolic tangent (\tanh) activation function πŸ›ˆ. Because the input matrix X incorporates the intercept as its first column, the bias term is implicitly included in the layer’s weights:

\begin{aligned} \log p(X_i \mid \mu, \sigma, \theta) & = \text{LogSumExp} \left[ \log(\theta_{ik}) + \log \mathcal{N}(X_i \mid \mu_k, \sigma_k) \right]_{k=1}^K \\ \theta_i & = \text{Softmax}(\phi_i) \\ \phi_i & = H_i W_2 + \log(\pi) \\ H_i & = \tanh(X_i W_1) \\ \pi & \sim \text{Dirichlet}(\frac{\alpha}{K}, \dots, \frac{\alpha}{K}) \\ W_1 & \sim \text{Normal}(0, 1) \\ W_2 & \sim \text{Normal}(0, 0.05) \\ \mu_k & \sim \text{Normal}(0, 5) \\ \sigma_k & \sim \text{Exponential}(1) \\ \end{aligned}

where:

  • X_i is the observed data vector for the i-th observation.
  • \theta_{ik} is the probability of assigning observation i to component k.
  • \pi is the global cluster weight vector, which enforces parsimony via a Dirichlet prior.
  • \alpha is the concentration parameter (e.g., 0.05), driving unused clusters towards zero.
  • H_i is the hidden layer representation vector for the i-th observation (D_H = 10).
  • W_1 and W_2 are the weight matrices of the gating network.
  • \mu_k and \sigma_k are the mean and standard deviation for the k-th Gaussian mixture component.
  • All gating network weights are assigned independent Normal priors.

Reference(s)

  1. Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press.